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Abstract — We analyze quantum broadcast channels, which are 
quantum channels with a single sender and many receivers. 
Focusing on channels with two receivers for simplicity, we 
generalize a number of results from the network Shannon theory 
literature which give the rates at which two senders can receive 
a common message, while a personalized one is sent to one of 
them. Our first collection of results applies to channels with 
a classical input and quantum outputs. The second class of 
theorems we prove concern sending a common classical message 
over a quantum broadcast channel, while sending quantum 
information to one of the receivers. The third group of results 
we obtain concern communication over an isometry, giving the 
rates at quantum information can be sent to one receiver, while 
common quantum information is sent to both, in the sense that 
tripartite GHZ entanglement is established. For each scenario, we 
provide an additivity proof for an appropriate class of channels, 
yielding single-letter characterizations of the appropriate regions. 
We conclude with applications of the recently discovered state 
merging primitive, obtaining achievable rates for distributing 
independent quantum information among the parties in various 
ways, both with and without the assistance additional classical 
discussion among the receivers. 

IN classical information theory, a discrete memoryless 
broadcast channel with a single sender Alice and two 
receivers, Bob and Charlie, is modeled by a probability 
transision matrix p(y,z\x). The study of such channels was 
initiated by Cover in [4], where the idea of superimposing 
information in order to achieve rates of communication better 
than those achievable by naive timesharing protocols was 
introduced. There, it was also conjectured that the capacity 
region for sending common information at rate R, as well 
as independent personal information to each receiver at rates 
Ry and Rz, over a degraded broadcast channel (see e.g. [5] 
for a definition) consists of those triples of nonnegative rates 
(R,Ry 7 Rz) satifying 

Ry < I(X;Y\T) 
R + Rz < I(T;Z) (1) 

for some p(t,x), where \T\ < mm{\X\,\y\,\Z\}. Cover's 
conjecture was validated by a coding theorem of Bergmans 
[3] together with a particularly clever proof of the single-letter 
converse by Gallager [15]. 

Such a result, however, gives no guarantee that the per- 
sonal information being sent to each receiver will only be 
understandable at that receiver. One can require that only the 
intended receivers will be able to understand their own private 
messages. This problem was intially addressed by Wyner [31] 
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in the context of wiretap channels. Due to a well-established 
[8] operational correspondence between privacy and quantum 
coherence, a result of particular relevance to ours is one by 
Csiszar and Korner [7] which shows that private information 
can be sent to Bob at rate Ry, while public information is sent 
to both Bob and Charlie at rate R, over an arbitrary broadcast 
channel p(y, z\x) if and only if there exists p(t, v)p(x\v) such 
that 

< Ry < I{V;Y\T) - I{V;Z\T) 

< R < min{I(T;Y),I(T;Z)}. (2) 

In this paper, we consider quantum generalizations of the 
above problems in various settings, as we did in a similar 
manner for multiple access channels in [32]. We begin by 
considering common and personalized messages over channels 
with a classical input and quantum outputs. We then analyze 
the capabilities of arbitrary quantum broadcast channels for 
sending a common classical message to Bob and Charlie, 
while also sending quantum information to Bob, in the sense 
of generating EPR entanglement [8]. Our bounds on the 
quantum rates should be compared to those of Csiszar and 
Korner for sending private information to Bob, due to the 
privacy-coherence correspondence. Next, we show that for 
isometric channels, the common classical message can be 
made coherent, enabling the generation of GHZ entanglement 
among the three participants. Finally, we establish achievable 
regions for certain variations of the previous scenario in which 
all parties may communicate classically with each other for 
free in order to obtain various quantum correlations among 
themselves, providing applications of the recently discovered 
state merging primitive [21] for quantum information. 

I. Preliminaries 

A. Classical and quantum systems 

Throughout this paper, we use labels such as A, B, C to 
refer to quantum systems, writing Ha for the Hilbert space 
whose unit vectors correspond to the pure states of the quan- 
tum system A. All Hilbert spaces will be finite dimensional, 
and we abbreviate dim 7^ as \A\, so that Ha = C'' 4 '. Given 
two systems A and B, the pure states of their composite 
system AB correspond to unit vectors in Hab = "Ha ®Hb. 
When we introduce a pure state, we use a superscripted label 
to identify the system to which the state refers. For example, 
\4>) A £ Ha and \ip} AB £ Hab- The same convention will be 
followed when the state of a quantum system A is described 
by a density matrix, so that p A £ C' yl ' x '' 4 ' is a nonnegative 
definite Hermitian matrix with Tr p A = 1. For a multipartite 
density matrix p ABC , we frequently abbreviate its partial 
traces as p AB = Trp p ABC . In later references to the global 
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state, we frequently drop the superscript completely, although 
the partial traces will always have superscripts. We often use 
the abbreviation = |</>)((/>| when referring to the rank-one 
density matrix corresponding to a pure state \<f>). 

For an arbitrary matrix M € <C dxd , its trace norm \M\i is 
defined as the sum of its singular values, expressed as \M\i = 
Tr V MM^. Given two states p A and a A , their trace distance 
\p — <t\i is the trace norm of their difference. We use the 



superscript A' — > B as a definition of the domain and range 
of the channel, to be omitted in later references to Af. In 
this paper, a quantum broadcast channel Af A ^ BC refers to 
a quantum channel with a single input and two outputs. We 
often personify the users of the channel, saying that Alice 
controls the input, while Bob and Charlie are located at the 
respective outputs. Defining the channel from Alice to Bob 
as Af A '^ B = Ti c Af A '^ BC , with a similar definition for 



squared version of the fidelity, defined as F(p, a) = \s/py/^\i- Af , we will say that the broadcast channel Af 



A^BC 



IS 



When p — \<p), the fidelity evaluates to F(\<j)),a) 
These distances are related [14] via 

F(p,a) > 

\p-o-\x < 2y/l -F(p,<r). 



\c\4>). degraded whenever there exists a degrading channel Af^ 



(3) 
(4) 



Since the trace distance comes from a norm, it satisfies the 
usual triangle inequality 

\px - Pz\i < \pi - pz\i + \p2 - pa\i- 

We shall frequently make use of classical-quantum states 
and classical-quantum channels [10] in this paper. To any finite 
set X, we associate a Hilbert space TL x with orthonormal basis 
{\x) x } X £X, so that for any classical random variable X which 
takes the value x € X with probability p(x), we may write a 
density matrix 

P X 



which is diagonal in that basis. For any S C X, if P$ for the 
projector onto the subspace spanned by {\x) } x& s> we then 
have 

Pr{A eS} = TvP s p x = 

xes 

An ensemble of quantum states {p B ,p(x)} can be represented 
in a similar way with a block diagonal classical-quantum (cq) 
state 



„XB 



= 5>(z)|z}(x| X ®/^0^)pf. 



Wherever possible, we will adopt the more compact direct sum 
notation (J) for describing cq states, with the understanding 
that the labels of the blocks correspond to states of an 
additional classical system. 

B. Channels 

A classical-quantum (cq) channel W X ^ B describes a phys- 
ical setup in which the sender Alice is able to remotely 
prepare any one of a collection of conditional density matrices 
{Px}xex in the laboratory of Bob. By a cq broadcast channel 
W X ~* BC from Alice to Bob and Charlie, we mean a physical 
scenario in which Alice prepares any one of a collection of 
bipartite conditional density matrices {p B }xex- 

By a quantum channel Af A ^ B from A 1 to B, we mean 
a trace-preserving linear map from density matrices on A' to 
those on B which is also completely positive. Such an operator 
may be referred to as a map in this text, usually when it does 
not represent a physical process between spatially separated 
parties. Here, we parallel the state convention by treating the 



from Bob to Charlie satisfying Af 



B^C , 



Af 



B^C 



A'^B 



In other words, the following diagram must commute: 




Remark 1: In the classical literature, such channels have 
been called stochastically degraded, meaning that the random 
variables X, Y and Z, analogous to A, B and C of the state 

p ABC = J^A'^BC^AA'^ form a Markoy chain X — Y — Z. 

However, in a quantum Markov chain [18] A—B—C with state 



P 



ABC 



, there must exist a recovery map A4 



B^rBC 



satisfying 



A4(p AB ) — p ABC . In our case we have the weaker condition 
Af B ~* c {p AB ) = p AC . The two conditions are equivalent in the 
classical problem because classical information can be copied. 

there always exists an isometry 



Given a channel 



A'^B 



U A ^ BE into an unobservable environment which extends the 
channel, meaning that Af A '^ B = Ti- E U A '^ BE . We will call 
such an isometry an isometric extension of Af A ~^ B . While 
there are generally many choices for an isometric extension 
of a given channel, all are related via isometries on the 
environment E. On the other hand, any channel obtained by 
disregarding the output B of such an isometric extension will 
be said to be complementary to Af A ~^ B , which we write 
Af A ~* E = TybU A ^ BE - In case the isometric extension 
U A ^ BE of Af A ~^ B is a degraded broadcast channel, in 
the sense that there is a degrading map Af B ^ E for which 



AC 



A'^E 



Af, 



B^E 



Af 



A'^B 



, we will say that the channel 



Af A ~^ B is degradable [9]. Concrete examples of degradable 
channels include erasure channels [2], qubit flip channels, and 
photon number splitting channels [16]. 

A particular class of degradable channels which are relevant 
to this paper are the generalized dephasing channels [9], 
[32]. These are channels Af A ~* B with \A\ = \B\ which act 
noiselessly on some common orthonormal basis (Ix)" 4 , |x) B }. 
Such channels have an isometric extension 



U A^BE = ^2 



) E \x) B (x\ A 



for some (not necessarily orthogonal) normalized vectors 
{|^u) £ }> an d a complementary channel acting as 



Writing 



A4(p) = ][>|p|*><e 

x 

A^ B :p^^|x)(x|Hx)(a 



(5) 
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for the completely dephasing channel, which sets to zero 
all off-diagonal matrix elements, any generalized dephasing 
channel N A ~* B satisfies 

W c o A = Af c (6) 
H(A(p)) > H(N{p)). (7) 

For the decoding of classical information, we use (somewhat 
interchangably) the notions of POVM's and quantum instru- 
ments x> A ^ BX . The latter is a quantum channel whose target 
is a cq system. Such a map can be specified in terms of a col- 
lection of (generally) trace-reducing reducing maps {T> A ^ B } 
for which J2 X ^ x * s trace-preserving. The instrument then acts 
as T>(p A ) = ® x V x (p A ). Given a POVM {A^} on A, its 
associated measurement instrument iy A ^ x has components 
acting as V x (p A ) = TrA x p. 

C. Entropy and information quantities 

Let p ABC be any tripartite density matrix. We write 
H(A)p = H(p A ) = - Tr(p A logp A ) for the von Neumann 
entropy of the reduced density matrix p A , omitting the sub- 
scripted state when it is apparent. As is common with much 
of quantum Shannon theory, certain linear combinations of 
entropies of various subsystems of the joint state p ABC arise 
naturally in the characterizations of the various rate regions we 
will introduce. We review the essential ones here, beginning 
with the conditional entropy 

H(A\B) = H(AB) - H{B). 

This quantity is defined in direct analogy to its counterpart in 
classical information theory, which is always positive and can 
be regarded as an average entropy of conditional probability 
distributions. It is however clear that H(A\B) = — 1 when 
evaluated on an EPR state -±(\00} AB + 111)- 45 ), a fact 
which was long considered problematic for the use conditional 
entropy in quantum information theory. Nonetheless, the neg- 
ative of conditional entropy has been defined as the coherent 
information 

I(A)B) = -H(A\B) 

from A to B, due to its utility in characterizing the capacity 
of a quantum channel for transmitting quantum information 
[24], [28], [8] as a certain optimization problem which always 
yields a nonnegative rate. Following [26], we may also write 

I c (p A ' 7 N A '^ B ) ^ I(A)B)„ M , 

where |y) A ' 4 is any purification of p A . An operational 
interpretation of both positive and negative conditional en- 
tropies was recently given in [21], where the primitive of 
state merging was introduced, yielding a quantum counterpart 
to the classical Slepian-Wolf theorem for distributed data 
compression. We will use this merging primitive in Section II- 
D to transmit quantum information over a broadcast channel. 

The mutual information and conditional mutual information 
are respectively defined as 

I(A; B) = H(A) - H(A\B) = H{A) + H{B) - H(AB) 



and as 

I(A;B\C) = H{A\C) - H{A\BC) (8) 

= I(A; BC) — I(A; C) 

= I(AC;B)-I(B;C). 

By the strong subadditivity [23] of quantum entropy, it follows 
that mutual information and conditional mutual information 
are nonnegative. There are many equivalent formulations of 
strong subadditivity which we will now recall. By simple 
algebra, I(A;B\C) > is seen to be equivalent to the 
inequality H(A\BC) < H(A\B) which is interpreted as 
saying that conditioning reduces entropy , and thus increases 
coherent information I(A)BC) > I(A)B). These can easily 
be used to derive either form of the data processing inequality, 
which say that given any channel J\f B ~* c , 

I{A-B) pAB > I(A-C) M{p a B) (9) 
I(A)B) p ab > I{A)C) mp A B) . (10) 

In other words, processing the output of a channel will never 
increase the mutual or coherent information over that channel. 
We remark that the first inequality above includes the Holevo 
bound [19] as a special case, since a measurement can be 
considered as a quantum channel with a strictly classical 
output. Note that (10) can also be written 

I C ( P A ' \N A '^ B ) < I c (p A ' , M B ^ C oM A '~* B ) 

for every p A , M A ~^ B and M B ^ C . Finally, given a quadri- 
partite system A\A 2 B\B 2 , the following inequality is implied 
by and also implies strong subadditivity: 

H(A 1 A 2 \B 1 B 2 )<H{A 1 \B 1 )+H{A 2 \B 2 ). (11) 

II. Main results 

A. Degraded message sets for cq channels 

In what follows, a sequence xix 2 ...x n , with each Xi 
belonging to some set X will be denoted by x n . Using 
many instances of a cq broadcast channel W X ^ BC , suppose 
that Alice wishes send a personal message to Bob while 
simultaneously sending an independent common message to 
Bob and Charlie. If W has conditional density matrices p BC , 
we define an (i?, e) code for W to consist of an 

encoding {x n (m, k) G X n } where (to, k) e 2 nR x T iRb , 
a POVM {A mk } on B" and a POVM {A^} on C" which 
satisfy 

Tr i o xn(mifc) (A„ lfe ® AJ TI ) > 1 - e 

for every (m,k) 6 2 nR x T iRb . A rate pair (R,R B ) is 
achievable if there is a sequence of (R, Rb, n, e„) codes with 
e„ — > 0. The classical capacity region C{W) of W is defined 
as the closure of the collection of all such achievable rate 
triples. 

In Theorem 1, we give a regularized expression for C(W), 
generalizing a coding theorem from [3], though we also prove 
a multi-letter converse. Theorem 2 shows that for a class 
of degraded cq channels, that characterization can be single- 
letterized, generalizing the classical converse of [15]. 
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Theorem 1: Given a cq channel W with conditional 

density matrices {p BC }, C(W) equals the closure of the pairs 
of nonnegative rates (R,Rb) which satisfy 

R B < I(X k ;B k \T) a /k 
R < mm{I(T;B k ) a ,I(T;C k ) a }/k 

for some k > 1 and some p(t,x k ) giving rise to 

a TxkB " ck =®p{t,x k )pS Ck (12) 

t,X k 

with |T| < min{|A'| fe ,| J B| 2fe + |C*| 2fc - 1}. Here, p xk = 

Theorem 2: Suppose that the conditional density matrices 
{Px C } °f a cc l channel W X ~* BC are such that their restrictions 
{p B } mutually commute, and that the restrictions {p^} satisfy 
p% = M(p B ) for some channel M B ^ C . Then 

Rb < I(X;B\T) a 
R < I{T;C) a 

for some state a TXBC f the form (12) with k = 1 and \T\ < 
xwn{\X\,\B\ 2 }. 

B. Classical-quantum region CQ(Af) for quantum channels 

In this scenario, Alice wishes to send quantum information 
to Bob at rate Q, while sending common classical information 
to Bob and Charlie. To this end, she prepares one of many 
states {|T m ) AA } which are entangled between a system A 
in her laboratory and the inputs of some large number of 
parallel identical broadcast channels. Bob employs a quantum 
instrument T>f ~^ AMb t with the goal of learning the classical 
message, as well as holding the A part of a highly entangled 
state. Meanwhile, Charlie performs a measurement, modeled 
by the instrument ~* M c ^ to i earn the common classi- 
cal message. Such components will be said to comprise an 
(R, Q,n,e) cq entanglement generation code for the broadcast 
channel j\f A '~* BC if, for \m) MBMc = \m) MB \m) M °, 

^(|m) MBMc |$ Q )^,(X» 1 ®X> 2 )o^"(T^'")) > 1-e 
for each m, where |$g)' 4 ' 4 is some fixed rate Q EPR state 

v z a=l 

A pair of nonnegative rates (Q, R) is called an achievable cq 
rate pair for entanglement generation if there exists a sequence 
of (Q, R, n, e„) cq entanglement generation codes with e„ — * 
0. The cq capacity region for entanglement generation CQ(N) 
is defined as the closure of the set of such achievable cq 
rate pairs. The following theorem describes CQ(N) of any 
broadcast channels as a regularized union of rectangles. 

Theorem 3: Let N A '~* BC be arbitrary. Then CQ{N) is 
equal to the closure of the collection of pairs of nonnegative 
cq rates (Q, R) satisfying 

Q < I{A)B k T) IJ /k 

R < min{I(T;B k ) a J(T;C k ) a }/k 



for some k > 1 and some state 

a TABkck =®p(t)M® k (<t>? A ' k )- d3) 
t 

arising from the action of A/"® fe on the A' k parts of some 
bipartite pure state ensemble {p(t), \<pt} AA }■ Moreover, it 
suffices to take \T\ < min{|^'| 2fc , \B\ 2k + \C\ 2k - 1}. 

Our next theorem gives a single-letter characterization of 
CQ whenever Charlie holds part of the environment of a 
generalized dephasing channel from Alice to Bob. 

Theorem 4: Let N A ~^ BC have an isometric extension 

u = j2\*) B m CE (x\ A ' 

X 

so that N A is a generalized dephasing channel. Then 
CQ{N) equals those pairs of nonnegative cq rates (Q,R) 
satisfying 

Q < H{X\T) - H{CE\T) 
R < I{T\C) 

for some state 

t,x 

with \T\ < \X\. 

In particular, this theorem applies to any isometric extension 
of the following pinching channel V : C 3x3 — > C 3x3 , which 
acts by setting some matrix elements to zero, while leaving 
the others alone, according to 




Mathematically, this channel is a completely positive trace- 
preserving conditional expectation of C 3x3 onto a *- 
subalgebra which is spatially isomorphic to C 2x2 ©C. For the 
broadcast channel corresponding to any isometric extension 
~^ BC of V where Charlie obtains the entire environment of 
V, a straightforward derivation reveals that the outer boundary 
of CQiU-p) is given by 

Qb = P 

R I 1 if f<V2 

\H(p) ifp>l/2 

where < p < 1, as is shown in Figure II-B. 

C. Quantum region Q(J\f) for quantum channels 

Here, Alice attempts to share a large bipartite entangled 
state with Bob, while also trying to build a large GHZ state 
with Bob and Charlie. Alice encodes by preparing the state 
|T) AGj4 , entangled with the inputs of a large number n of 
instances of N A ^ BC ' . Bob and Charlie employ respective 
decoding maps T> 1 B and T> 2 . These components 

comprise a (<5,Qs,w, e) entanglement generation code for 



5 



R 




Qb 

Fig. 1. CQ for pinching channel 

the broadcast channel JV if they generate a rate Q b EPR state 
\®Qb) AA and a rate Q GHZ state 

v ^ m=l 

in the sense that 
F(\$Q B ) AA ~\T Q ) GG ° G °,(V 1 ®V 2 )(r GAA ' n )) >l-e. 

Achievable rates and the capacity region Q(JV) are defined 
in analogy to the earlier scenarios. In Theorem 5, we give 
a multi-letter formula for Q{JV) in the case where JV is an 
isometry. Theorem 6 derives a single-letter formula for Q(JV) 
in case JV is an isometric extension of a generalized dephasing 
channel to Bob. Note that these results can be regarded 
as dynamic analogs of those obtained by [29], who study 
distillation of EPR and GHZ entanglement from arbitrary 
tripartite pure states. While those authors allow for additional 
classical communication, we do not. Similar correspondences 
exist in the literature, such as between [10] and [8], as well 
as between [21] and [32]. 

Theorem 5: Let U A ~^ BC be an isometric broadcast chan- 
nel. Then Q{U) is given by the closure of the pairs of 
nonnegative quantum rates (Q,Qb) satisfying 

Q B < I(A)B k T) a /k 
Q < min{I(T;B k ) a ,I(T;C%}/k 

where a TAB c takes the same form as in (13), replacing J\f 
with U. The bound on \T\ is the same as well. 

Theorem 6: Let U A ~* BC be a broadcast channel which is 
an isometric extension of a generalized dephasing channel to 
B, written 

U = Y J \x) B \^) C {x\ A '- 

x 

Then QiU) equals the set of pairs of nonnegative quantum 
rates (QbtQ) satisfying 

Qb < H(X\T) U -H(C\T) U 
Q < I(T;G% 

where 

x ./ 



and \T\ < \X\. 

When U A ^ BC is an isometric extension of the pinching 
channel V A this theorem yields the rate region from 
Figure II-B with R replaced by Q. 

Remark 2: Using the standard technique of restricting to a 
high-fidelity subspace of the input, it is possible to strengthen 
the previous four theorems to obtain stronger error criteria, 
such as that from the strong subspace transmission of [32]. 
We have, however, focused on entanglement generation for 
simplicity. 

D. Achievable quantum rates from state merging 

Suppose that three players, Alice, Bob and Charlie, share 
the respective parts of many instances of a tripartite pure 
state \ip) ABC . Assuming that Alice can send classical bits to 
Bob for free, it was recently shown [21] that the quantum 
communication cost for Alice to transfer her A n systems to 
Bob is asymptotically equal to H(A\B), regardless of the 
negativity of the expression. Specifically, whenever H(A\B) 
is negative (or equivalently, when I{A)B) > 0), Alice 
and Bob can generate EPR entanglement at rate I(A)B) in 
the process of transferring A n to Bob, using only classical 
communication and no quantum communication whatsoever. 
This is an improvement over standard entanglement distillation 
[11], where the same amount of EPR entanglement is obtained 
without deliberately trying to accomplish state merging. On 
the other hand, in case H(A\B) > 0, the protocol requires 
as input Alice-Bob EPR entanglement at a rate of at least 
H(A\B) ebits per system to be transferred. Therefore, if Alice 
and Bob performed state merging yesterday at a negative 
cost, they can use the extra entanglement they generated to 
perform merging with a positive cost today. As in [21], we 
only directly consider state merging when the cost is negative, 
as the protocol with positive cost is obtained by having Alice 
establish an appropriate amount of pure entanglement with 
Bob, so that the total coherent information they share becomes 
positive. Formally, a negative cost state merging protocol for a 
state \^p) ABC consists of an instrument J\A A ^ DAI with com- 
ponents Ai A to be performed on Alice's systems A n , to- 
gether with a collection of decoding operations X>^ ~~ >B A D 
for Bob. The quantum outputs D and D hold Alice's and 
Bob's respective halves of the entanglement resulting from 
the protocol, while each copy of A corresonds to a system 
located in Bob's laboratory which is isomorphic to A, whose 
purpose is to hold the corresponding part of the transferred 
states. The protocol proceeds as follows. Alice performs the 
instrument M. ~* DM and tells the classical result M to Bob 
who, depending on that classical data he receives, uses the 
appropriate decoding map. These components will be said to 
comprise a (Q,n, e) negative cost state merging protocol for 

\^ABC j f = = 2 «Q and 

F(\iP)® n \<S> Q ) DD -, ^(1 C " ® V m ® MM® n j) > 1 - e, 

rn 

where \&q) dd is a rate Q maximally entangled state. The 
following proposition is from [21], and is proved in [22]. 
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Proposition 1: Let a pure tripartite state \ijj) ABC satisfying 
I(A)B) > be given. Then, for every e > 0, and every < 
Q < 1(A) B), there is n sufficiently large so that there exists 
a (Q, n, e) negative cost state merging protocol for c . 

We now state a theorem. 



Theorem 7: Let M 



A'^BC 



be arbitrary. If Bob can commu- 



nicate for free with Charlie via a classical channel, then Alice 
may generate rate Qc entanglement with Charlie whenever 
there is a bipartite pure state for which 

Qc < 1(A) BC) a and I(B)C) a > 0, 

where a ABC = J\f(ip AA ). In addition, the same protocol 
allows Bob and Charlie to generate independent EPR entan- 
glement between themselves at any rate less than I(B)C). 
Here, we give an outline of the proof: 

Proof: Assume that Alice and Charlie share common 
randomness. Fixing a single-letter reference state \ip) AA sat- 
isfying the conditions of the theorem, Alice uses a random 
LSD code of rate I(A)BC) a (see Proposition 4) based on her 
common randomness with Charlie, pretending as though Bob 
and Charlie can collaborate in their decoding. As her average 
code density matrix is close to the product state (ip A )® n , 
the output state of Bob and Charlie is close to (cr )® n . By 
assumption, I(B)C) a > 0, so there is a negative cost for Bob 
to transfer his B n systems to Charlie. This means that Bob and 
Charlie can distill EPR's at any rate less than I(B )C) cr during 
this process. Charlie uses the common randomness to decode 
the random LSD code, thus establishing the rate I(A)BC) a 
entanglement with Alice. Finally, the protocol is derandomized 
using standard arguments. ■ 

Finally, we demonstrate that Alice may generate, and also 
transmit [1] independent entanglement between herself and 
each receiver without the assistance of classical communica- 
tion between the two receivers. 

Theorem 8: Let _\f A '^ B c be arbitrary, and let \^)A b A g a' 
be entangled between local systems Ab and Ac in Alice's lab 
and the A' input to the channel. Provided that I(Ab 
and I(Ac)C)fif^) are positive, Alice may generate those 
same amounts of independent entanglement with each receiver. 
We will be content to outline the proof: 

Proof: If communication is allowed between Alice and 
each of the receivers, the theorem is immediate from a double 
application of Proposition 1 (or, rather, entanglement distilla- 
tion [11], as the state merging aspect is not needed). We now 
argue that the classical communication is not needed. Alice 
begins such a protocol by applying instruments £ a b^ bm an( j 



£' c- ^^ with components 



'} m and {£' k Ac ~* C }k to 



the A B and A c parts of the state a A B A c Bncn = (AT(ip))® n . 
Conditioned on receiving the classical message M — m, 
Bob performs T>f n . Conditioned on receiving the classical 
message K = k, Charlie performs 2?'^ ^ C . By entanglement 
distillation, there exist m and k such that applying T) m <£> T>k 
to (£ m <g> £' k )((i)/ Tr(£ m ® gives a state close to the 

tensor product of the two desired maximally entangled states. 
Thus, Alice could have prepared 



in the first place, eliminating the need for classical commu- 
nication. If we are interested in entanglement transmission 
instead of entanglement generation, then Alice is given a pu- 
rification of the BC systems rather than being able to prepare 
them directly. Luckily, she may always produce T^J" 4 by a 
(possibly noisy) encoding T. A direct adaptation of the result 
of [1] guarantees that T may be replaced by an isometry. It 
would be desirable to have a direct proof of this theorem (cf. 
[8] for the single user case) instead of invoking entanglement 
distillation and [1]. ■ 
Remark 3: The regularized optimization over such 



A B A C A' 



yields the capacity region when there is 



no Bob-Charlie communication, although the resulting 
characterization of this capacity region is unlikely to be the 
most useful. 

III. Proofs of main results 

Let us first state some auxilliary results on which our proofs 
rely. 

Lemma 1 (Gentle measurement (average version) [30]): 
Let p, A be random d x d matrices such that p is a density 
matrix and < A < 1 which satisfy ETr Ap > 1 — e. Then 



E\VX(r/A- p\ x < 



Lemma 2 (Gentle coherent measurement [17]): Suppose 
that a POVM {A m } identifies the elements of a set of pure 
states {\tp m ) B }, in the sense that Tr A m ip m > 1 — e for 
every m. Then, there is an isometry \> B ~* BB which satisfies 



7i | (<p m \V\<Pm) > 1 — e for each m 
Lemma 3 (Continuity Lemma) 



\p — ct|i < S for some < 
inequalities hold: 



If p AB and a AB satisfy 
< 1/e, then the following 



2H{5) 
3H(S) 



\H{A\B) p -H{A\B) a \ < 
\I{A-B) p -I(A-B) a \ < 
Proof: Fannes' [13] has shown that 

\H(p AB )-H(a AB )\<H(5) 



4(5 log|AB| 
6c51og|AB|. 



26 log \AB\. 



By monotonicity, the trace distances between partial traces of 
p are no greater than 8, so after expanding the conditional 
entropies and mutual informations, we may apply the triangle 
inequality, proving the result. ■ 

We will also need the following two lemmas, which are 
respectively proved as Lemmas 1 and 2 of [32]: 

Lemma 4: Suppose p,a,A £ B(Tt), where p and a are 
density matrices, and < A < 1. Then, 

TrAcr > Tr Ap — \p-o\\. 

Lemma 5: For any state p AB with partial traces p A and p B 
and any \ip) A and a B , we have 



yBCA' 

mk 



(£ m ®4)(^")/Tr(£ m ®4)(V> 



F(p AB ,i, A ®a B )>\- 3(1 - F(\^ A ,p A )) \p B a B \ r 

Next, we state an average error version of the HSW Theorem 
for cq codes with codewords chosen i.i.d. according to a 
product distribution [27], [20]. 
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Proposition 2 (HSW Random Coding Theorem): Given is 
a cq state a x Q = Q) x p{x)p® and a number < R < 
I(X;Q) a . For every e > 0, there is n sufficiently large 
so that if 2 nR codewords C = {X n (m)} are chosen i.i.d. 
according to the product distribution p(x n ) = YYi=iP( x i)' 
corresponding to input preparations p x n = p x . , there 
exists a decoding POVM {A m } on Q n , depending on the 
random choice of codebook C, which correctly identifies the 
index m with average probability of error less than e, in the 
sense that 

E c 2-" i? ^Tr /9x „ (m) A m >l-e. (14) 

m— 1 

The following proposition is a classical-quantum analog of 
Corollary 3.8 from [6]. 

Proposition 3: Let {p BC } x<£ x be a cq channel W X ^ BC , 
and let p(x) and e, S > be given. If 

< R = xmn{I(X;B),I(X;C)} - S 

and ?? is large enough, there is a set of 2 nR HSW codewords 
{x n (m)}, each of the same type P satisfying |P — p\i < 8, a 
measurement on B n with POVM {A m } and a measurement 
on C n with POVM {A' m } such that for each m, 

Tr(A TO ® A' m )p m > 1 - e 

where p m = ® i A t . <(ro) . 

Proof: Follows from standard arguments (see e.g. [29]). ■ 

The following quantum coding proposition for single-user 
channels is proved in [8] and concerns the existence of random 
entanglement transmission codes whose average code density 
matrix can be made arbitrarily close to a product state. 

Proposition 4 (LSD Random Coding Theorem): Given is a 
channel Af : A' — > B, a density matrix p A , and a number < 
R < I c (p, Af) . For every e > 0, there is 77 sufficiently large so 
that there is a random ensemble of (2 nR , 77, e) entanglement 
generation codes (pp, \Tp) AA " , V B "^ A ) for Af with average 
code density operator q a — J^gP^Tr^T^ satisfying \q — 
1 < e. Moreover, each code in the ensemble is good, in 



n (T AATl )) > 1-e for each 



the sense that F(\$) AA , Vp o 
value of the randomness (3. 

Remark 4: In practice, we omit the common randomness 
index, treating the encoding and decoding as a pair of corre- 
lated random objects. 

A. Proof of Theorem 1 

Proof: (Coding theorem) Let W X ~* BC be a cq broadcast 
channel with conditional density matrices p BC and let p(t, x) 
be arbitrary. Together, these probabilities and states define the 
joint cq state 



©p(*,s)/£ C = ©p(*)o? 



The corresponding conditional distribution p(x\t) defines a set 
of conditional density matrices 



XBC 
t 



for a new cq channel y T ^ BC \ representing a "backed up" 
version of the original channel W. Note that these conditional 
density matrices can be used to rewrite 



TBC 



TXBC 



BC 



For any e, S > and sufficiently large n, we will show that 
for rates Rb and R satisfying 



and 



I{X; B\T) a - (1 + \X\)6 <R B < I{X; B\T) c 



0<R = min{/(T; B) a , I(T; C) a } - S, 



there exists an (R, R B , n, 2e 1/s ) code for W X ^ BC . 

We will construct the required doubly-indexed set of code- 
words {x(m, fc)} m6 2" H ke2 nR B as follows. First, we select a 
rate R code for the channel y r -* BC which conveys the index 
777 G 2 nR to Bob and Charlie. Then, for each t, we pick a 
random HSW code of blocklength approximately p(t)n for 
W X ~* BC with codewords selected i.i.d. according to p(x\t), 
such that if Bob knows t, he can decode at rates approaching 
I(X; B)„ t . Note that because of the randomness in this second 
coding layer, the average state seen by Bob on any channel 
output where the t'th code was used is equal to t b . 

To decode, Bob and Charlie first use their measurements 
from the common code, allowing them to identify m well 
on average. In addition to knowing the common message m, 
Bob then knows which instances of the channel were used 
with which random codes, so that he can apply an appropriate 
decoder, which depends on the randomness in the second 
coding layer, to learn his personalized message k. Note that 
since 



I{X;B\T% 



p(t)I(X;B) e 



the personal rate to Bob will be near that which is desired. 
We then infer the existence of a deterministic code with low 
error probability for all message pairs. 

We begin by invoking Proposition 3 to obtain an (R, n, e) 
code {t n (m), A m , AJ„} m62 " fi f° r V T ^ BC with codewords of 
type P satisfying |P — p\\ < 5. Recall that for each 777, 

A' )T Bnc " 

rn / rn 



Tr(A ri 



>l-e, 



(15) 



where r m = ®jT tj ( m ). 

For each t, define the integer n t = nP(t), as well as 
e t = eP(t), S t = SP(t) and R t = I(X;B)„ t - S f < \X\. 
It follows from Proposition 2 that for each t, there exists 
an (R t ,n t ,e t ) random HSW code {X nt (k t \t), A^} fcte2 „H t 
(here, {X n ' (k t \t)} is just a doubly indexed family of random 
variables) for the channel W X ^ B to Bob which satisfies 



2»«t 



E2-^^Trpf;Ag)>l-e t 

fe t =i 



(16) 



where the expectation is over the randomness in the HSW 
codes. Above, we have abbreviated B t = B nt and taken 

Pkt =<S>Px i (k t \ty 

i=l 
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Each Xi(k t \t) is chosen independently according to p(x\t), 
so that E pk t — rf nt . Observe that by the symmetry of the 
random code construction, (16) may be equivalently expressed 
as 

ETrpf'A^ > 1 - e t . 
Noting that the personal rate to Bob is given by 

u 



r b = J2— Rt = J2 p ^ Rt 



> 



t 



p(t)Rt - \P - P \i\X\ 
> I(X;B\T)-(\X\ + 1)5, 



and also that Rb < I(X; B\T), we may uniquely identify 
any message fc G 2 nRn for Bob with a collection of mes- 
sages {k t G 2 nRt } t . Recalling that all of the codewords 
{i"(m)} m62 iH are of the same type and setting d = \T\, 
we may assume w.l.o.g. that t n (l) = l n i 2 Tl1 • • -d nd , so that 
we may identify a collection of permutations {7r(m) : T n — > 
T"} for which t n (m) = n(m)(t n (l)). By letting these 
permutations act on X n in the same way, we may define 
Alice's (random) encoding via 

X n (m,k) = ^(X^ik^X^ikiW ■ ■■X^ikald)). 



ik 



Px n (mk)' observing that for each k, 



We abbreviate pi 
we have Ep mk = r m . 

To decode, Bob first measures {A m } while Charlie mea- 
sures {AJ n }, after which they declare their respective results 
to be the common message M. Next, Bob will permute his B n 
systems according to ir" 1 ^), obtaining a state close to pf m . 
For each t, he then measures each block of n t outputs with 
the corresponding {A^} to obtain (fci, . . . , k t ) — k, which 
he declares as his personal message. Bob's overall procedure 
can be summarized in terms of the POVM {A m k}, defined as 
A mfe = ^/A^,A^ m ^/A^, where we take 



A 



k\m 



7r(m) 



i A 



with 7r(m) now acting to permute B n in the obvious way. 
Defining 



Pmk — 
Pmk 

we estimate 

EP mk 



Tr(A mfc <g> A' m )p^ k 



B n C n 



> 
> 



ETr^; c "A fc | m 

ETr Pmfc A fc|m 
ETr Pmk A k\m 



®\p?nl C " 



B"C" 
Pmk 



I A- - 



The first inequality is by Lemma 4 and the second by 
Lemma 1. We may now derandomize, concluding that there 
is a particular value of the common randomness such that 



2" H 2 nH B 

2~n{R B +R) 

m=l k=l 



Pn 



According to Markov's inequality, half of the values of m sat- 
isfy P m > 1 — 2e x / 4 . Among those, half of the corresponding 
k's are such that P mk > 1 - y / 1 - P m > 1 - 2e 1 / 8 . By only 
using those m's, the common rate R is reduced by a negligible 
— , For each such m, throwing out the worst half of the k's 
reduces Rb by the same amount. This completes the proof. 

■ 

Proof: (Converse) Assuming that (Rb,R) is achievable, let 
{x n (m, k)}, {A m j.} and {A' n } comprise any (R,n,e n ) code 
in the achieving sequence. Setting 



n 



MKX" 
ink 



we write 



MKX"B"C n 



\m)(m\ ® \k)(k\ ® \x n (m,k)}(x n (m,k)\, 



m=l fe=l 



Supposing that Bob stores his decoded messages in the 
registers Kb and Mg, while Charlie stores his in Mq, let 
qMKMbMcKb be the joint state after the decoding. Then, for 



le e'n — > 0, we have 



nR = H(M) 

< I(M;M c )n + ne' n 

< I(M;C n ) u +ne' n , 



(17) 



The second line is by Fano's inequality (see e.g. [5]) and the 
third is by the Holevo bound [19]. A similar argument yields 
nR < I(M;B n ) + ne[[. Finally, we bound 



uRb = 

< 
< 
< 

< 



„ Jill 



H(K) 
I{K;K B ) n 
I(K;B n ^- 
I(K; B n M) u + ne% 
I(K;B n \M)^+ne';: 
IiX^-B^M^+ne 1 ' 



(18) 



The middle four lines are by Fano's inequality, the Holevo 
bound, data processing, and the independence of K and M. 
The last inequality uses the Markov chain KM — X n — B n . 
Choosing T = M completes the proof. ■ 



HETTpf'A^-VTe 

t 

> i - Vi. 



> 1 



B. Proof of Theorem 2 

By data processing, I(X; C) a < I(X; B) a . Hence, the cod- 
ing theorem follows from the previous one. It only remains to 
single-letterize the bounds appearing in the previous converse. 
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Proof: (Converse) We begin by rewriting the conditional for Af A 
information from (18): < R = 

I{X n ;B n \M) = H(B n \M) — H{B n \X n M) 

n 

= [HiB^B^M) - H(Bi\X n B i ~ 1 M)] 

i=l 
n 

= J2 [H^B^M) - H(Bi\XiB i ~ 1 M)] 

n 



-> BC , provided that < Q = I{A")BT) C 
min{/(T; B) a , I(T; C) a } - 6, where 

a TABC = ^ p{t)m fA' y 



5 and 



= nliXs-^slMB^S) 
= nI(X;B\T). 

The third line holds because of the Markov chain 

X^X^-XiMB*- 1 -B h 

where we abbreviate Xf +1 = Xi + i ■ ■ ■ X n for i < n, setting 
it equal to a constant when i = n. To see that this is a Markov 
chain, note that the left recovery map is deterministic, while 
the right recovery map prepares the appropriate state of Bi 
given the value of X,. In the remaining steps, we define S ~ 
unif{l, . . . , n}. The last identifies T = SMB 8 ' 1 and X = 
Xs- We continue bounding (17): 

n 

I(M;C n ) = ^/(Af^lC"- 1 ) 

i=l 
n 

= J2 [HiCilC*- 1 ) - HiC^MC 1 - 1 )] 

i=l 
n 

< ^[^(CO-i/^lMC 1 - 1 )] 

i=l 
n 

i=i 

n 

= ^/(MF- 1 ;^) 



= nI{MB s ' 1 -C s \S) 

< n[/(M5 s - 1 ;C s |5) + 7(5;C's)] 

= nI(SMB s - 1 ;C s ) 

= nI(T;C). (19) 

Here, the third and fourth lines follow from the fact that con- 
ditioning reduces entropy and data processing with respect to 
appropriate tensor products of the degrading map A4 B ~* C . The 
last step identifies C = Cs- Observe that the commutativity 
of the {p B } was needed to identify T with a classical random 
variable. ■ 

C. Proof of Theorem 3 

Proof: (Coding theorem) Let j\f A ^ BC be an arbitrary 
broadcast channel and fix an ensemble of bipartite pure states 
{p{t), \4>t) A A }■ For any e, S > and sufficiently large n, we 
will show that there exists an (R, Q, n, 36 1 / 4 ) cq entanglement 
generation code 



{ \T m ) AA '" , £>f n - KlB A , T>f^ Mc 



We do this by showing that there are two POVMs: 
{A m } me2 ™« on B n and {A' m } me2n n on C", as well as a 
collection of maps T> B ^~* A , for which the trace-reducing maps 
{T> m (y/A m ( • )-y/A m )} me2 " R are the components of T>i, and 
T> 2 implements {A' m } me2 nR. 

For each t, we set pf = Tta" <Pt and t bc = Af(p t ), 
defining a cq channel y T ^ BC with conditional density ma- 
trices t bc . As in the previous coding theorem, we invoke 
Proposition 3 to obtain, for sufficiently large n, an (R, n, e) 
code {t n (m), A m , A^} me2 "« for V with codewords of type 
P satisfying \P — p\\ < e. For each m, we write p A ^ = 
<8>* Pu( m y recalling that 



Tr(A m ® A' m W® n ( P : 



A'" 



> 1 



(20) 



As in the direct coding part of the proof of Theorem 1 
we define n t = nP(t), e t = eP(t) and S t = 5P(t). We 
also assume that for |T| = d, the first codeword is t n (l) = 
l ni 2 n ' 2 ■ ■ ■ d nd so that there are permutations 7r(m) of T n 
satisfying t n (m) = n(m)(t n (l)). 

For each t E T, we may set Qt = I c (rt,M) — St and con- 
clude from Proposition 4 that there exists a (Q t , n t ,et) random 



entanglement generation code {ITt)" 4 '" 4 *,2? 



fl"'- 



average code density operator 



A"' 



} whose 



lTr,4 t T t satisfies 



Pt ll S Q- 



(21) 



It is also guaranteed that for each t, the state 

created by the tth random quantum code approximately con- 
tains rate Q t entanglement between Alice and Bob, in the 
sense that 



F(|$ Qt }^STr c „ t 2? t (6)) >l-e t . 
Equating A = A t , we make the definitions 



(22) 



AA 



\AA 



\A t A 



(l x ®7r(m))|T] 



\AA 



where we extend 7r(m) to act by permuting the registers A' n 
in the obvious way. Defining the average code density operator 



for the new code as Q n 



\ Tr^4 T m , note that we can bound 



Pm\\ = 



< 



< 



E 

t 

E 



Pt 



(23) 



where we have used unitary invariance of the trace norm, 
telescoping, and (21), in that order. To send the classical 
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message m, Alice prepares the state \T m ) AA ". The structure 
of the decoder is similar to that from the proof of Theorem 1 . 
Bob and Charlie begin by performing their respective mea- 
surements, in order to ascertain the classical message. Then 
Bob permutes his output systems accordingly and applies the 
quantum decoder V = ^ t T> t . 

We will write the the joint state after Alice sends her 
encoding through the channel as 



r) 



AB n C" 



so that in particular, the state corresponding to the first 
message is $ AB ™ C " = (g) t ^ AtBnt c "* . Note that if the decoder 
T> B ~* A is applied directly to $x> the resulting Alice-Bob state 
is nearly maximally entangled: 



= ilF(\<S> Qt ) A t A >,Tv c , H V t &)) 

t 



> i - 



= l-e, 



(24) 



where the first inequality is by (22). Next, define the subnor- 
malized density matrices 



which depend on the shared randomness in the quantum code, 
and are proportional to the states which result when Bob and 
Charlie both correctly learn the message in. This happens with 
probability bounded as 



ETrtf 



AB n C n 



> 



Tr(A, 
Tr(A, 
Tr(A, 



®A^)ETr A < B " c " 

,®A' m W® n (g A ' n ) 
®k' m )N® n { P A ' n 



) 



1 - 2e. 



(25) 



In the second to last line, we have applied Lemma 4 along with 
monotonicity with respect to A/"® n , while the last line uses the 
estimates (20) and (23). We may now write the expectation, 
over the shared randomness in the quantum code, of the fidelity 
F m between the state resulting from the protocol when the mth 
common message is sent and the target maximally entangled 
state as 



EF m = EFi 



= EF(|$q) , Tro V(i9 

> l-E|$ Q -Tr c ™£W 

> l-E|$ Q -Tr c »X'(^i: 

> 1 - 2Ve - V8 • 2e 

> 1 - Qy/l 



lAB n C r 



)) 



| -E|0i-0i| 



Here, the first line follows by the permutation symmetry of the 
code, while the third uses (3). The fourth is a consequence of 
the triangle inequality, together with monotonicity with respect 



to Ti'^n V. The estimates in the second to last line are obtained 
by applying (4) to (24) (which holds without the expectation), 
as well as Lemma 1 to (25). 

At this point, it is possible to derandomize our code. Having 
proved that 



E2 



we may conclude that there is a deterministic value of the 
shared randomness from the quantum codes yielding the same 
average error bound. By throwing out the worst half of the 
codewords, Markov's inequality implies that we are left with 
a code for which 

F m > 1 - 3e 1/4 



for every to, while reducing the rate by a negligible i. ■ 
Proof: (Converse) Assume that (Qb,R) is achievable 
and let {\T m ) AA ' n } me2 „ H , v?"^* and T>f^ M ° be 
a (Q, R, n, e„) cq entanglement generation code from any 
achieving sequence. Defining the state 



, ,MAB"C" o-n-R 



A^"(T 

m£2» s 



AA' n 



(26) 



and setting £}MM b m c aa = ^ <g, X> 2 )(u>), we may upper 
bound the quantum rate Q via 



I{A)B n M) u > 1(A) A) n 

> /(i4)I)» 0B -n< 
= nQ B -ne' n . 



(27) 



The first step is by data processing with respect to Trj\/ T>i, 
while the second is by the Continuity Lemma 3, for some 
e' n — ► 0. The classical rate R may also be bounded as 



nR = H(M) n 

< I(M;M c h 

< I(M-C m )^- 



(28) 



where e" — > 0, and we have used Fano's inequality and 
the Holevo bound. Another consequence of the Holevo 
bound is that I(M;M B )u < I(M;B n ) u , yielding nR < 
min{/(M;5 I %,/(M;C m ) w }. We have thus shown that for 
any 5 > 0, the rate pair (Q — S, R — 5) is contained in Q(Af). 
As Q{JV) is closed by definition, this completes the proof. ■ 

D. Proof of Theorem 4 

As the coding theorem follows from that of Theorem 3, 
we need only single-letterize the corresponding multi-letter 
converse. 

Proof: (Converse) Under the assumption that N A is a 
generalized dephasing channel, we will further upper bound 
the information quantities (27) and (28) appearing in the multi- 
letter converse of Section III-C by appropriate single-letter 
quantities. We begin working with the state w Mj4B "C" from 
(26) which is induced by an (R,Q,n,e n ) cq entanglement 
generation code from an achieving sequence. Recalling from 
(5) that the completely dephasing channel A sets to zero all 
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off-diagonal matrix elements in the dephasing basis set 



A 8rl (Tr^ T m ), observing that we may write 
Q* n =®p(x n \m) 

for some conditional probabilities p{x n \m). Let us now define 
the state 



mE2" B 

= ^ nR E^"H^ 

mG2 nR x n 



where we abbreviate ?/) G „ E = ip^ Ei . Abbreviating 
M A '^ B to Mb and M A '~* C to A/"c, the left hand side of 
(27) can be written 

I(A)B n M) u =2~ na J2 Ic(Tr A T m ,M§ n ). 

By (6) and (7), each summand can be upper bounded as 

I c (Tr A T m ,M® n ) < H(MT(Q m ))-H((M B )f n (g m )). 

Combining these last two equations yields 

I{A)B n M) u < H{B n \M) ul -H(C n E n \M) u , 
= H(X n \M) u , - H(C n E n \M) UJ > 

where we have renamed B n to X n to emphasize its classical- 
ity. From now on, we rename uj' mb c E to u' M c E 
accordingly. Identifying T, = MX'" 1 , T = ST S and 
XCE = X S C S E S , for 5 ~ unif{l, . . . ,n}, observe that 
S — T — XCE forms a Markov chain. This identification 
defines the state Vt TXCE , for which 



H(X n \M) u 



Y.HiX^MX'- 1 ) 

i=X 

nH{X\T) n . 



By data processing with respect to appropriate tensor products 
of the map \x)(x\ i— > ipx E > we mav u PP er bound 

71 



< 



= -nH{CE\T) n , 

obtaining ±J(A)B"A/) W < H{X\T) n - H(CE\T) n . 

It is perhaps instructive to see that O can be explicitly 
written as Q TXCE = Q) t p{t)£L XCE , where we take T = 
M x 1+J s A" 5 " 1 (here A" is the empty set), and 

^F-,=@p{x s \m,x^ E . 

We now continue by bounding the mutual information in (28) 
via 

I(M;C n )„ = I(M;C n )^ 
< nI(T;C) a . 



Here, the first step is because M§ n o A®" = M§ n , which 
follows from (6) because Mc = ^e(M'b)c, while the second 
follows from manipulations which are identical to those used 
to bound (19) in the converse to Theorem 2; the only differ- 
ences are that we use data processing with respect to tensor 
products of the map \x)(x\ i— ► Tr Eip x and relabel B 1 ^ 1 to 
X l ~ l . This proves the claim. ■ 

E. Proof of Theorem 5 

Proof: (Coding theorem) Letting U A ^ BC be an arbitrary 
isometry, we set Mb = Tr^W. For any bipartite pure state 
ensemble {p(t), \<ftt) A A }me2nr and any e > 0, the previous 
coding theorem shows (relabeling RtoQ and Q to Qb) that 
as long as n is large enough, there is a (Q, Qb, n, 6y/e) cq 
entanglement generation code {|T m ), T> m , {A m }, {AJ n }} for 
U A ^ sc , provided that the rates satisfy 



Q <mm{I{T;B)„,I(T;C)„} 



and 



Q B <I(A")BT) a . 
These quantities are computed with respect to the state 



A"BCT 



A" A' 
t 



We will show how to make the common classical message 
coherent. For each to € 2 nR , define 



/ \AB n C" 



|T' m ) 



u® n \r r 



and observe that 

(TLl(l A ®A ro ®A' TO )|T^) > 1-e. 



By Lemma 2, there are thus coherent local measurements 

yB n ^B n G B an( j yyC n ->C n G c satisfying 



h GcGb (t:j(v®w)|t:j>i- 



(29) 



for each to, where we take |to) GbGc = |to) Gb |to) Gc . Now, 
there are local unitaries (permutations of the Hilbert space 
factors, in fact) V^"^ B ™ and W G " _>G " which satisfy 

{v m ®w m )\r m ) = \r l ) ABncn 

because |T m ) is just a permutation of the A' n part of the fixed 
representative \Ti) AA " . Define the controlled unitary 



V 



b — >B n G b — ^ ^ 



m)(m\ (g) Vrr, 



and similarly define v^ G " Gc "- +G " Gc . Setting 

iv , i} ab^gbGc = ((yov)®(WoW))\r'j, 

we may reexpress (29) as 

We now define Alice's encoding as 



(30) 



|T) 



GAA' 



/2«Q ^— ' 



writing 



\^/\GAB n C' 1 l / (® rl \ n £\AA' n 
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and also setting 

^GAB-C-GsGc = ((y oV)®{Wo W)) |T') 

as before. We now bound 



2-^^(m'| G (m'| GBGc (T' 1 ||m) G |T: i 



= 2- nQ J2 



\GbGc 



( T 'ill T 



// \AB n C"G B Gc 



> 1 



(31) 



The last line uses the estimate (30). Since the construction in 
the previous coding theorem guarantees that 



£>i(Tr c ™ V{ 



lAB n C n 



))>1 



(32) 



we may then employ Lemma 5 to combine these last two 
estimates to show that the state 



qGGbGcAA _ Trc<n ^^j.iiGAB n C n G B Gc^ 

which results from the protocol satisfies 

F = F(\T Q ) GGBGc \<i> QB ) AA ,n GGBGcAA 

> 1-3(1- f(\^ Qb ) aa ,q aA 



\yGGsGc C)GGbGc\ 
I Q il ll 



> 1 - 3e - 

> 1 - 6-v/e. 



'8e 



The bound on the fidelity in the second line is from (32), 
while the bound on the trace distance in the next line is by 
application of (4) to the square-root of the fidelity in (31). 
This proves the coding theorem. ■ 
Proof: (Converse) Observe that any (Q, Qb, n, e) qq entan- 
glement generation code is able to establish e— good uniform 
common randomness between Alice, Bob and Charlie at rate 
Q, in the sense that they generate a triple of random variables 
(M A ,M B ,M C ) which satisfy 

|dist(M A , Ms, M c ) - dist(M, M, M)\ 1 < 2e, 

where M is uniformly distributed on {1, . . . , 2 nR }. To accom- 
plish this, Alice will measure the G part of her input T GAA 
in the GHZ basis {|m) } at any point in the protocol, while 
Bob and Charlie measure their respective bases {\m) B } and 
{\m) Gc } after their decodings are complete. The previous 
converse provides an upper bound on this uniform common 
randomness generation rate for protocols which also generate 
Alice-Bob entanglement at rate Q, therefore proving this 
converse. ■ 



F. Proof of Theorem 6 

Proof: (Converse) As per the remarks in the previous 
converse theorem, the converse part of the proof of Theorem 4 
applies here as well. ■ 



Appendix I 
Proof of cardinality bounds for T 

In Theorem 1 (with k = 1), let a finite set T and conditional 
probabilities p(x\t) be arbitrary. Geometrically, this amounts to 
fixing a T-labeled set of points on the A" -probability simplex. 
We will show that given any probabilities p(t) on T, there 
exists another distribution q(t) which puts positive mass on at 
most min{ |A"|,|i?| 2 + |C| 2 — 1} elements of T, while satisfying 

I(X;C\T) q = I(X;C\T) p 
I{T;B) q = I(T;B) p , 
I{T-C) q = I(T;C) p , 

where the subscript q means the quantity is evaluated on the 
state 



TXBC — 



^q{t)p{x\t)p B x G . 



We will prove this by use of the following lemma: 

Lemma 6 (Fenchel and Eggleston [12]): Let S C K" have 

at most n connected components. Then any point in the 

convexification of S can be written as a convex combination 

of at most n points in S. 

It will thus be sufficient to show that the map 

/ : p(t) ^ (/(X; C\T) P , I(T; B) p , I(T; C) p ) 

factors through an affine space of sufficiently low dimension. 
To this end, we decompose / into a nonlinear part f n \ and an 
affine part 

/*r:p(t) - 52p(t)(l(X;B\t),tf,p?) 
t 

= (l{X-B\T),p B ,p G ), (33) 
so that the following diagram commutes: 



Pit). 



(I(X;C\T) P ,I(T;B) P ,I(T;C), 



/aff 



(l(X;B\T),p B lP t 



We regard the affine map as producing convex combinations 
of the points in some affine space parameterizations of the 
{ (l{X; B\t), pf , pf ) } teT , weighted by the probabilities p(t). 
As pf and p G can be specified either by their individual pa- 
rameterizations or by p(x\t), the more efficient representation 
requires at most min{|A'| — 1, \B\ 2 — 1 + |C| 2 — 1} numbers. 
Since the first coordinate can be taken to be I(X; B\t) itself, 
we see that at most min{|A'|, |£?| 2 + |C| 2 — 1} affine parameters 
are required to describe (33). By continuity, the image of the 
T-simplex under / a ff is connected, and so we may use the 
earlier lemma to infer the existence of probabilities q(t) on T 
with support cardinality at most min{|, ; t'|, |£>| 2 + |C| 2 — 1}, 
while satisfying f(p(t)) = f(q(t)). 

For Theorem 2, the degradedness of the channel implies 
that the |C| 2 — 1 affine parameters of p c depend affinely on 
those of p B , allowing the reduction of the cardinality bound 
to \T\ < min{\Xl\B\ 2 }. 
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For the bound of Theorem 3, we instead begin by fix- 
ing states {pf }tgr- Here, the affine map outputs convex 
combinations of the points (I c (Pt >-A/b)> pf , pf)- As a pa- 
rameterization of the possible pf and pf requires no more 
than min{|A'| 2 - 1, \B\ 2 - 1 + |C| 2 - 1} coordinates, we 
obtain by similar reasoning as above that it suffices to take 
|T| <min{|A'| 2 ,|B| 2 + |C| 2 -l}. 

The bound for Theorem 4 follows in the same way as that 
for Theorem 2, although the fact that \B\ = \X\ implies that 
|T| < j^l is sufficient. In Theorems 5 and 6, the bounds are 
the same as those from Theorems 3 and 4 and follow for the 
same reasons. 
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